Heidi M. Sosik and Robert J. Olson. Automated taxonomic classification of phytoplankton sampled with imaging in-flow cytometry. Limnol. Oceanogr.: Methods 5, 2007, 204–216
نویسندگان
چکیده
High-resolution photomicrographs of phytoplankton cells and chains can now be acquired with imagingin-flow systems at rates that make manual identification impractical for many applications. To address the challenge for automated taxonomic identification of images generated by our custom-built submersible Imaging FlowCytobot, we developed an approach that relies on extraction of image features, which are then presented to a machine learning algorithm for classification. Our approach uses a combination of image feature types including size, shape, symmetry, and texture characteristics, plus orientation invariant moments, diffraction pattern sampling, and co-occurrence matrix statistics. Some of these features required preprocessing with image analysis techniques including edge detection after phase congruency calculations, morphological operations, boundary representation and simplification, and rotation. For the machine learning strategy, we developed an approach that combines a feature selection algorithm and use of a support vector machine specified with a rigorous parameter selection and training approach. After training, a 22-category classifier provides 88% overall accuracy for an independent test set, with individual category accuracies ranging from 68% to 99%. We demonstrate application of this classifier to a nearly uninterrupted 2-month time series of images acquired in Woods Hole Harbor, including use of statistical error correction to derive quantitative concentration estimates, which are shown to be unbiased with respect to manual estimates for random subsamples. Our approach, which provides taxonomically resolved estimates of phytoplankton abundance with fine temporal resolution (hours for many species), permits access to scales of variability from tidal to seasonal and longer. Acknowledgments This research was supported by grants from NSF (Biocomplexity IDEA program and Ocean Technology and Interdisciplinary Coordination program; OCE-0119915 and OCE-0525700) and by funds from the Woods Hole Oceanographic Institution (Ocean Life Institute, Coastal Ocean Institute, Access to the Sea Fund, and the Bigelow Chair). We are indebted to Alexi Shalapyonok for expert assistance in the lab and field; to Melissa Patrician for hours of manual image classification; to Cabell Davis, Qiao Hu, Kacey Li, Mike Neubert, and Andy Solow for insights into image processing, machine learning, and statistical problems; and to the Martha’s Vineyard Coastal Observatory operations team, especially Janet Fredericks, for logistical support. Limnol. Oceanogr.: Methods 5, 2007, 204–216 © 2007, by the American Society of Limnology and Oceanography, Inc. LIMNOLOGY and OCEANOGRAPHY: METHODS Sosik and Olson Phytoplankton image classification 205 submersible Imaging FlowCytobot (Olson and Sosik 2007), promise to revolutionize the ability to sample phytoplankton communities at ecologically relevant scales. They also, however, present new challenges for data analysis and interpretation. For instance, Imaging FlowCytobot can generate more than 10 000 high-quality plankton (and/or detritus) images every hour, and it can do so every day for months. This volume of data precludes manual inspection for cell identification as a feasible tool for many applications. If adequate analysis techniques can be developed for large data sets of plankton images, the results will bring new insight into a range of ecological phenomena including bloom dynamics, species succession, and spatial and temporal patchiness. With these applications in mind, an initial goal for analysis of image datasets is to quantify abundance accurately for a wide range of taxa present in mixed assemblages. To do this requires efficient and accurate identification of individual plankton images. This kind of classification problem has been addressed previously in particular applications involving plankton images. An important area of focus has arisen in response to availability of imaging systems optimized for observations of metazooplankton (> ~0.1 mm); these include systems designed for underwater measurements of live organisms, such as the Video Plankton Recorder (VPR) (Davis et al. 1992) and the Shadow Image Particle Profiling Evaluation Recorder (SIPPER) (Samson et al. 2001), as well the ZOOSCAN system for automated measurement of preserved samples (Grosjean et al. 2004). Davis and co-workers (Tang et al. 1998; Davis et al. 2004; Hu and Davis 2005) have made important contributions in developing several approaches for rapid analysis of plankton images generated by the VPR. This group has explored use of image characteristics (or features) such as invariant moments, granulometry, and co-occurrence matrices and use of machine-learning methods including learning vector quantization neural networks and support vector machines. In another approach, Luo et al. (2004), working with SIPPER-generated images, also addressed some of the challenges in including image features (such as area and transparency) that require accurate detection of the organism boundary within an image. Compared to the case for zooplankton, efforts to automatically analyze and identify phytoplankton images have been more limited, although some recent progress suggests that new developments are likely to be productive. In an early demonstration example, Gorsky et al. (1989) showed that simple geometric properties were sufficient to reliably distinguish 3 species with distinct size and shape. In a similar study, Embleton et al. (2003) were able to define a neural network to identify 4 very distinct species from microscopic images of lake water samples, with accuracy sufficient to resolve seasonal patterns in total cell volume. In another example involving several dinoflagellate species from the same genus, Culverhouse et al. (2003) argued that a neural network approach can achieve accuracy similar to manual identification by trained personnel. Culverhouse et al. (2006) have proposed that this be implemented for detection of harmful algal species, although the ability of their HAB Buoy system to acquire cell images of sufficient quality remains to be demonstrated. There has also been considerable effort to develop species-level automated classification techniques for diatoms from ornamentation and shape details of cleaned frustules (du Buf and Bayer 2002 and chapters therein, e.g., Fischer and Bunke 2002). Most recently, for the special case of Trichodesmium spp. present in colonies large enough for detection with the VPR, automated analysis has provided striking ecological and biogeochemical insights (Davis and McGillicuddy 2006). These examples from previous work point to the utility of automated image processing and classification techniques for problems in phytoplankton identification, but they all address a relatively narrow scope in terms of taxonomic range or image type (e.g., cleaned frustules). Blaschko et al. (2005) highlighted the challenges of moving beyond this level by presenting results with ~50% to 70% accuracy for a 12-category (plus “unknown”) image classification problem involving a variety of phytoplankton groups. For adequate ecological characterization of many natural marine phytoplankton assemblages, the relevant image analysis and classification problem is broad (taxonomically diverse) and must accommodate many categories (10-20, or more). Taxonomic breadth necessarily means a wide range of cell sizes and relevant identifying characters. Moreover, for images collected automatically over long periods of time, such as from Imaging FlowCytobot, it is critical that techniques are robust to a range of sampling conditions (e.g., changes in co-occurring taxa and variations in image quality related to lighting and focus). Here we describe a technique to address these challenges by combining selected image processing methods, machinelearning based classification, and statistical error correction to estimate taxonomically resolved phytoplankton abundance from high-resolution (~1 μm) images. Whereas the general approach is independent of the particular image acquisition system, we focus on data collected with Imaging FlowCytobot. Our approach builds on previous efforts in image classification for plankton, as well as some other image processing and classification applications such as face recognition and fingerprint recognition, while addressing the particular combination of image characteristics and identification markers relevant for Imaging FlowCytobot measurements of nanoand microphytoplankton in assemblages of highly mixed taxonomy. By characterizing temporal variability in a natural plankton community, we demonstrate that our approach achieves the overall goal of automatic classification of a wide variety of image types, with emphasis on morphologically distinct taxonomic groupings and accurate estimation of group abundance. Materials and procedures Our approach involves 5 main steps: 1) image processing and extraction of features (characteristics or properties), 2) feature selection to identify an optimal subset of characteristics for multiSosik and Olson Phytoplankton image classification 206 category discrimination, 3) design, training, and testing of a machine learning algorithm for classification (on the basis of selected features as input), 4) statistical analyses to estimate category-specific misclassification probabilities for accurate abundance estimates and for quantification of uncertainties in abundance estimates following the approach of Solow et al. (2001), and 5) application of the resulting feature extraction, classifier algorithm, and statistical correction sequence to sets of unknown images. Image data sets—The images used to develop, assess, and demonstrate our methods were collected with a custom-built imaging-in-flow cytometer (Imaging FlowCytobot) analyzing water from Woods Hole Harbor. All sampling was done between late fall and early spring in 2004 and 2005. Here we provide a brief summary of Imaging FlowCytobot design and image characteristics; details are available elsewhere (Olson and Sosik 2007). Imaging FlowCytobot uses a combination of flow cytometric and video technology to both capture images of organisms for identification and measure chlorophyll fluorescence and scattered light associated with each imaged particle. Its submersible and autonomous aspects were patterned after successes with the original FlowCytobot (Olson et al. 2003), while the addition of cell imaging capability and a design with higher sample volumes are critical for the application to microplankton. Imaging FlowCytobot uses a customized quartz flow cell (800 by 180 μm channel), with hydrodynamic focusing of a seawater sample stream in a sheath flow of filtered seawater to carry cells in single file through a red (635 nm) diode laser beam. Each cell passing through the laser beam scatters laser light, and chlorophyll-containing cells emit red (680 nm) fluorescence. Fluorescence signals are then used to trigger a xenon flashlamp strobe to emit a 1-μs flash of light, which illuminates the flow cell after passing through a green bandpass filter (514 nm). A monochrome CCD camera (1380 by 1034 pixels) and a frame grabber board are used to capture an 8-bit grayscale image of the corresponding cell. A 10× microscope objective focused on the flow cell is used to collect the images, as well as the scattered light and fluorescence from cells as they traverse the laser beam. This combination of fluidics and optical configuration provides images with target objects in consistent focus and typically having their major axis oriented with the longer axis of the camera field (i.e., along laminar flow lines). As described in Olson and Sosik (2007), the resulting images (considering the effects of magnification, camera resolution, and cell motion during flash exposure) can be resolved to approximately 1 μm, with the full camera field spanning ~300 by 400 μm. In real time, binary thresholding and a “blob” analysis algorithm (ActiveMIL 7.5, Matrox Electronic Systems Ltd.) are used to record only rectangular subregions of the camera field that contain cells or other objects (along with some adjacent background). Manual inspection of many images from our Woods Hole Harbor data set led us to define 22 explicit categories that represent subjective consideration of taxonomic knowledge, ecological perspective, and practical issues regarding groupings that can be feasibly distinguished from morphology visible in the images (Fig. 1; see also Appendix A). Many of the categories correspond to phytoplankton taxa at the genus level or groups of a few morphologically similar genera. Diatoms account for most of these categories: 1) Asterionellopsis spp.; 2) Chaetoceros spp.; 3) Cylindrotheca spp.; 4) Cerataulina spp. plus the morphologically similar species of Dactyliosolen such as D. fragilissimus (all having many small distributed chloroplasts; category labeled DactFragCeratul in figures and tables); 5) other species of Dactyliosolen morphologically similar to D. blavyanus (with chloroplasts typically concentrated in a small area within the frustule); 6) Ditylum spp.; 7) Guinardia spp. plus occasional representatives of Hemialus spp.; 8) Licmophora spp.; 9) Pleurosigma spp.; 10) Pseudonitzschia spp.; 11) Rhizosolenia spp. plus rare occurrences of Proboscia spp.; 12) Skeletonema spp.; and 13) Thalassiosira spp. plus similar centric diatoms. Nondiatom genera are 14) Dinobryon spp.; 15) Euglena spp., plus other euglenoid genera; and 16) Phaeocystis spp. In addition to the genus-level categories, we defined several mixtures of morphologically similar particles and cell types: 17) various forms of ciliates; 18) various genera of dinoflagellates > ~10 μm in width; 19) a mixed group of nanoflagellates; 20) single-celled pennate diatoms (not belonging to any of the other diatom groups); 21) other cells < ~20 μm that cannot be taxonomically identified from the images; plus 22) a category for “detritus,” noncellular material of various shapes and sizes. Fig. 1. Example images from 22 categories identified from Woods Hole Harbor water. Most categories are phytoplankton taxa at the genus level: Asterionellopsis spp. (A); Chaetoceros spp. (B); Cylindrotheca spp. (C); Cerataulina spp. plus the morphologically similar species of Dactyliosolen such as D. fragilissimus (D); other species of Dactyliosolen morphologically similar to D. blavyanus (E); Dinobryon spp. (F); Ditylum spp. (G); Euglena spp. plus other euglenoids (H); Guinardia spp. (I); Licmophora spp. (J); Phaeocystis spp. (K); Pleurosigma spp. (L); Pseudonitzschia spp. (M); Rhizosolenia spp. and rare cases of Proboscia spp. (N); Skeletonema spp. (O); Thalassiosira spp. and similar centric diatoms (P). The remaining categories are mixtures of morphologically similar particles and cell types: ciliates (Q); detritus (R); dinoflagellates > ~20 μm (S); nanoflagellates (T); other cells <20 μm (U); and other single-celled pennate diatoms (V). Sosik and Olson Phytoplankton image classification 207 For development and testing of the analysis and classification approach, we compiled a set of 6600 images that were visually inspected and manually identified, with even distribution across the 22 categories described above (i.e., 300 images per category). These identified images were randomly split into “training” and “test” sets, each containing 150 images from each category (see Appendix A for full image sets, provided here to facilitate future comparison with other methods applicable to this problem). Independent of the training and test sets, we also inspected every image acquired during randomly selected periods of natural sample analysis (~27 000 images in sample volumes ranging from 5 to 50 mL and measured spanning the period February to April 2005) for manual identification; this allowed specification of misclassification probabilities under real sampling conditions and evaluation of error correction procedures (described below) for accurate abundance estimates. Image processing and feature extraction—Our first objective was to produce, for each image, a standard set of feature values (characteristics or properties) which might be useful for discriminating among the 22 categories. We specified the standard feature set by considering characteristics that seem important for identification of images by human observers and on the basis of previous successes in related image classification problems. All imaging processing and feature extraction was done with the MATLAB software package (version 7.2; Mathworks, Inc.), including the associated Image Processing Toolbox (version 5.2; Mathworks, Inc.). We also incorporated algorithms described in Gonzalez et al. (2004) and implemented in the accompanying toolbox Digital Image Processing for MATLAB (DIPUM) (version 1.1.3; imageprocessingplace.com). For each image, the result of all feature extraction is a 210-element vector containing values that reflect various aspects of object size, shape, and texture, as described in more detail below (see Table 1). The original grayscale image is used to derive some features, but various stages of image processing are required for others (Table 1). As a first step, many of the features we calculate require information about the boundary of the targets of interest (or “blobs”) within an image, so preliminary image processing is critical for edge detection and boundary segmentation. We found that conventional edge detection algorithms were inadequate for reliable automated boundary determination over the range of image characteristics and plankton morphologies that we encounter with Imaging FlowCytobot. Approaches relying on intensity gradients, such as the commonly used Canny algorithm, could not be optimized to avoid artifacts from noise and illumination variations while reliably detecting challenging cell features such as spines, flagella, and localized areas that vary from brighter to darker than the background. For this reason, we turned to a computationally intensive but effective approach based on calculation of the noisecompensated phase congruency in an image (Kovesi 1999), as implemented in MATLAB by Kovesi (2005). Phase congruency is independent of contrast and illumination, and we found that simple threshold-based edge detection applied to phase congruency images provides excellent results for a wide range of phytoplankton cell characteristics (Fig. 2A–C). After edge detection, we used standard MATLAB functions for morphological processing (closing, dilation, thinning) and for segmentation algorithms to define blobs or connected regions (Fig. 2D). For some feature calculations (e.g., symmetry measures), images were also rotated about the centroid of the largest blob to align the longest axis horizontally (compare Fig. 2C and D). Finally, we used DIPUM toolbox functions to reconstruct a simplified boundary of the largest blob in each image on the basis of the first 10% of the Fourier descriptors (Fig. 2E) (Gonzalez et al. 2004). For the largest blob in each field (Fig. 2D), we calculate a set of relatively common geometric features such as major and minor axis length, area and filled area, perimeter, equivalent spherical diameter, eccentricity, and solidity (MATLAB Image Processing Toolbox, regionprops function), as well as several simple shape indicators (e.g., ratio of major to minor axis lengths, ratio of area to squared perimeter). For more detailed shape and symmetry measures, calculations were done on the Table 1. Summary of different features types determined for each image, specifying algorithm source and the stage of image processing at which the features are calculated. Feature type Algorithm or code source Image processing stage No. features No. selected Simple geometry MATLAB Image Blob image 18 17
منابع مشابه
Plankton Analysis by Automated Submersible Imaging Flow Cytometry: Transforming a Specialized Research Instrument into a Broadly Accessible Tool and Extending its Target Size Range
Detailed knowledge of the composition and characteristics of the particles suspended in seawater is crucial to an understanding of the biology, optics, and geochemistry of the oceans. The composition and size distribution of the phytoplankton community, for example, help determine the flow of carbon and nutrients through an ecosystem and can be important indicators of how coastal environments r...
متن کاملContinuous automated imaging-in-flow cytometry for detection and early warning of Karenia brevis blooms in the Gulf of Mexico.
Monitoring programs for harmful algal blooms (HABs) typically rely on time-consuming manual methods for identification and enumeration of phytoplankton, which make it difficult to obtain results with sufficient temporal resolution for early warning. Continuous automated imaging-in-flow by the Imaging FlowCytobot (IFCB) deployed at Port Aransas, TX has provided early warnings of six HAB events. ...
متن کاملGrowth rates of coastal phytoplankton from time-series measurements with a submersible flow cytometer
Our understanding of the dynamics of phytoplankton communities has been limited by the space and timescales associated with traditional monitoring approaches. To overcome some of these limitations, we have developed a submersible flow cytometer (FlowCytobot) that is designed for extended autonomous monitoring of phytoplankton abundance, cell size, and pigmentation. FlowCytobot was moored on the...
متن کاملMeasuring the ecological significance of microscale nutrient patches
parative rapid ammonium uptake by four species of marine phytoplankton. Limnol. Oceanogr. 27: 814-827. -, J. J. MCCARTHY, AND D. G. PEAVEY. 1979. Growth rate influence on the chemical composition of phytoplankton in oceanic waters. Nature 279: 210-215. HEALEY, F. P. 1980. Slope of the Monod equation as an indicator of advantage in nutrient competition. Microb. Ecol. 5: 281-286. HUTCHINSON, G. E...
متن کاملCharcoal analysis in marine sediments
ecosystems. Rapp. P-V. Reun. Cons. Int. Explor. Mer 173: 6065. PRAIRIE, Y. T., R. H. PETERS, AND D. E BIRD. 1995. Natural variability and the estimation of empirical relationships: A re-assessment of regression methods. Can. J. Fish. Aquat. Sci. 52: 788-798. PRICE, D. DE SOLLA. 1967. Science since Babylon. Yale. RICHTER, C. E 1958. Elementary seismology. Freeman. RODRIGUEZ, J., E JIMI~NEZ, B. B...
متن کامل